Skip to content

Fix VAE offload encode device mismatch in DreamBooth scripts#13417

Merged
sayakpaul merged 2 commits intohuggingface:mainfrom
azolotenkov:fix-flux2-vae-offload-device-mismatch
Apr 6, 2026
Merged

Fix VAE offload encode device mismatch in DreamBooth scripts#13417
sayakpaul merged 2 commits intohuggingface:mainfrom
azolotenkov:fix-flux2-vae-offload-device-mismatch

Conversation

@azolotenkov
Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes a device mismatch in the Flux2 DreamBooth training scripts when --offload is used and latents are not precomputed.

non-cached path:
pixel_values were converted to vae.dtype but not moved onto accelerator.device before vae.encode(...), which can leave the inputs and VAE weights on different devices under offload.

Reproduction:

accelerate launch \
  examples/dreambooth/train_dreambooth_lora_flux2_klein.py \
  --pretrained_model_name_or_path hf-internal-testing/tiny-flux2-klein \
  --instance_data_dir /data \
  --instance_prompt "a photo of sks dog" \
  --resolution 64 \
  --train_batch_size 1 \
  --max_train_steps 1 \
  --text_encoder_out_layers 1 \
  --offload

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

Before submitting

Who can review?

@sayakpaul

Copilot AI review requested due to automatic review settings April 5, 2026 20:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a device/dtype mismatch when VAE offloading is enabled in DreamBooth LoRA training example scripts by ensuring pixel_values (and conditional pixel values for img2img variants) are moved to accelerator.device before calling vae.encode(...) in the non-latent-cached path.

Changes:

  • Move batch["pixel_values"] to device=accelerator.device (in addition to dtype=vae.dtype) before VAE encoding under offload_models(...).
  • For img2img variants, also move cond_pixel_values to accelerator.device and ensure both vae.encode(...) calls occur inside the offload context.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
examples/dreambooth/train_dreambooth_lora_z_image.py Moves pixel_values to accelerator.device before vae.encode in the non-cached path.
examples/dreambooth/train_dreambooth_lora_flux2.py Same device move for Flux2 DreamBooth LoRA training when not using cached latents.
examples/dreambooth/train_dreambooth_lora_flux2_klein.py Same device move for Flux2 Klein variant when not using cached latents.
examples/dreambooth/train_dreambooth_lora_flux2_klein_img2img.py Moves both pixel_values and cond_pixel_values to accelerator.device and encodes inside offload context.
examples/dreambooth/train_dreambooth_lora_flux2_img2img.py Moves both pixel_values and cond_pixel_values to accelerator.device and encodes inside offload context.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul
Copy link
Copy Markdown
Member

Failing tests are unrelated.

@sayakpaul sayakpaul merged commit d31061b into huggingface:main Apr 6, 2026
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants